Changes to TRT-LLM download tool for multigpu distributed case #3830

apbose · 2025-09-22T06:33:45Z

TRT-LLM installation tool for distributed

The download is to be done by only one GPU to avoid unnecessary downloads
Use of barrier in the tool for the above purpose
The util functions for TRT-LLM installation is moved to dynamo/distributed/utils.py with the initialization and cleanup of the env to be done on user side. (We do it in tests/py/dynamo/distributed and examples/distributed_inference/tensor_parallel_initialize_dist)

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/utils.py	2025-09-22 06:35:28.523784+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/utils.py	2025-09-22 06:36:00.657186+00:00
@@ -863,6 +863,6 @@
    return False


def is_thor() -> bool:
    if torch.cuda.get_device_capability() in [(11, 0)]:
-        return True
\ No newline at end of file
+        return True

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/distributed/utils.py	2025-09-25 19:33:28.176615+00:00
+++ /home/runner/work/TensorRT/TensorRT/py/torch_tensorrt/dynamo/distributed/utils.py	2025-09-25 19:34:02.325958+00:00
@@ -100,11 +100,10 @@
        return True

    except Exception as e:
        logger.warning(f"Failed to detect CUDA version: {e}")
        return False
-

    return True


def _cache_root() -> Path:

github-actions

There are some changes that do not conform to Python style guidelines:

--- /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_nccl_ops.py	2025-10-13 18:42:31.890493+00:00
+++ /home/runner/work/TensorRT/TensorRT/tests/py/dynamo/distributed/test_nccl_ops.py	2025-10-13 18:43:14.026641+00:00
@@ -23,10 +23,11 @@
if not dist.is_initialized():
    dist.init_process_group(
        backend="nccl",
        init_method="env://",
    )
+

class DistributedGatherModel(nn.Module):
    def __init__(self, input_dim, world_size, group_name):
        super().__init__()
        self.fc = nn.Linear(input_dim, input_dim)

…iGPU

github-actions bot added component: tests Issues re: Tests component: conversion Issues re: Conversion stage component: api [Python] Issues re: Python API component: dynamo Issues relating to the `torch.compile` or `torch._dynamo.export` paths labels Sep 22, 2025

meta-cla bot added the cla signed label Sep 22, 2025

github-actions bot requested a review from peri044 September 22, 2025 06:34

github-actions bot requested changes Sep 22, 2025

View reviewed changes

apbose mentioned this pull request Sep 22, 2025

Changes to TRT-LLM download tool for multigpu distributed case #3784

Closed

apbose force-pushed the abose/trt_llm_installation_dist branch from 6e99bbc to 7134053 Compare September 22, 2025 22:47

github-actions bot added the component: build system Issues re: Build system label Sep 25, 2025

apbose force-pushed the abose/trt_llm_installation_dist branch from 3f1fa7e to 54948d9 Compare September 25, 2025 19:33

github-actions bot requested changes Sep 25, 2025

View reviewed changes

apbose force-pushed the abose/trt_llm_installation_dist branch 3 times, most recently from 2bbc423 to 5beefc0 Compare September 25, 2025 22:13

apbose changed the title ~~Changes to TRT-LLM download tool for multigpu distributed case~~ Changes to TRT-LLM download tool for multigpu distributed case [WIP] Sep 25, 2025

apbose force-pushed the abose/trt_llm_installation_dist branch from 5beefc0 to 809c7ee Compare September 26, 2025 00:11

apbose changed the title ~~Changes to TRT-LLM download tool for multigpu distributed case [WIP]~~ Changes to TRT-LLM download tool for multigpu distributed case Sep 26, 2025

apbose force-pushed the abose/trt_llm_installtion branch 5 times, most recently from b96b9ee to 2f2cd31 Compare October 7, 2025 17:27

apbose force-pushed the abose/trt_llm_installation_dist branch from 809c7ee to 5fb74da Compare October 11, 2025 02:04

github-actions bot added documentation Improvements or additions to documentation component: lowering Issues re: The lowering / preprocessing passes component: converters Issues re: Specific op converters component: runtime component: torch_compile labels Oct 11, 2025

apbose changed the base branch from abose/trt_llm_installtion to main October 11, 2025 02:05

github-actions bot requested changes Oct 13, 2025

View reviewed changes

apbose force-pushed the abose/trt_llm_installation_dist branch 2 times, most recently from d289587 to bd02455 Compare October 13, 2025 20:05

apbose added 4 commits October 13, 2025 13:38

Changes to TRT-LLM download tool for multigpu distributed case

d78751e

Distributed utils package, separating out env for single GPU and mult…

56b80db

…iGPU

changes to account for the base branch change

c7bf852

the barrier for TRT-LLM installation

38224c5

apbose force-pushed the abose/trt_llm_installation_dist branch from bd02455 to 38224c5 Compare October 13, 2025 20:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Changes to TRT-LLM download tool for multigpu distributed case #3830

Changes to TRT-LLM download tool for multigpu distributed case #3830

Uh oh!

apbose commented Sep 22, 2025 •

edited

Loading

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

github-actions bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Changes to TRT-LLM download tool for multigpu distributed case #3830

Are you sure you want to change the base?

Changes to TRT-LLM download tool for multigpu distributed case #3830

Uh oh!

Conversation

apbose commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

apbose commented Sep 22, 2025 •

edited

Loading